Goto

Collaborating Authors

 Chiapas






LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World

Semnani, Sina J., Zhang, Pingyue, Zhai, Wanyue, Li, Haozhuo, Beauchamp, Ryan, Billing, Trey, Kishi, Katayoun, Li, Manling, Lam, Monica S.

arXiv.org Artificial Intelligence

This paper presents LEMONADE, a large-scale conflict event dataset comprising 39,786 events across 20 languages and 171 countries, with extensive coverage of region-specific entities. LEMONADE is based on a partially reannotated subset of the Armed Conflict Location & Event Data (ACLED), which has documented global conflict events for over a decade. To address the challenge of aggregating multilingual sources for global event analysis, we introduce abstractive event extraction (AEE) and its subtask, abstractive entity linking (AEL). Unlike conventional span-based event extraction, our approach detects event arguments and entities through holistic document understanding and normalizes them across the multilingual dataset. We evaluate various large language models (LLMs) on these tasks, adapt existing zero-shot event extraction systems, and benchmark supervised models. Additionally, we introduce ZEST, a novel zero-shot retrieval-based system for AEL. Our best zero-shot system achieves an end-to-end F1 score of 58.3%, with LLMs outperforming specialized event extraction models such as GoLLIE. For entity linking, ZEST achieves an F1 score of 45.7%, significantly surpassing OneNet, a state-of-the-art zero-shot baseline that achieves only 23.7%. However, these zero-shot results lag behind the best supervised systems by 20.1% and 37.0% in the end-to-end and AEL tasks, respectively, highlighting the need for further research.


The Robustness of Structural Features in Species Interaction Networks

Fard, Sanaz Hasanzadeh, Dolson, Emily

arXiv.org Artificial Intelligence

Species interaction networks are a powerful tool for describing ecological communities; they typically contain nodes representing species, and edges representing interactions between those species. For the purposes of drawing abstract inferences about groups of similar networks, ecologists often use graph topology metrics to summarize structural features. However, gathering the data that underlies these networks is challenging, which can lead to some interactions being missed. Thus, it is important to understand how much different structural metrics are affected by missing data. To address this question, we analyzed a database of 148 real-world bipartite networks representing four different types of species interactions (pollination, host-parasite, plant-ant, and seed-dispersal). For each network, we measured six different topological properties: number of connected components, variance in node betweenness, variance in node PageRank, largest Eigenvalue, the number of non-zero Eigenvalues, and community detection as determined by four different algorithms. We then tested how these properties change as additional edges -- representing data that may have been missed -- are added to the networks. We found substantial variation in how robust different properties were to the missing data. For example, the Clauset-Newman-Moore and Louvain community detection algorithms showed much more gradual change as edges were added than the label propagation and Girvan-Newman algorithms did, suggesting that the former are more robust. Robustness also varied for some metrics based on interaction type. These results provide a foundation for selecting network properties to use when analyzing messy ecological network data.


The Unseen Targets of Hate -- A Systematic Review of Hateful Communication Datasets

Yu, Zehui, Sen, Indira, Assenmacher, Dennis, Samory, Mattia, Fröhling, Leon, Dahn, Christina, Nozza, Debora, Wagner, Claudia

arXiv.org Artificial Intelligence

Machine learning (ML)-based content moderation tools are essential to keep online spaces free from hateful communication. Yet, ML tools can only be as capable as the quality of the data they are trained on allows them. While there is increasing evidence that they underperform in detecting hateful communications directed towards specific identities and may discriminate against them, we know surprisingly little about the provenance of such bias. To fill this gap, we present a systematic review of the datasets for the automated detection of hateful communication introduced over the past decade, and unpack the quality of the datasets in terms of the identities that they embody: those of the targets of hateful communication that the data curators focused on, as well as those unintentionally included in the datasets. We find, overall, a skewed representation of selected target identities and mismatches between the targets that research conceptualizes and ultimately includes in datasets. Yet, by contextualizing these findings in the language and location of origin of the datasets, we highlight a positive trend towards the broadening and diversification of this research space.


Unlock the Future of Autonomous Drones with Innovative Secure Runtime Assurance (SRTA)

IEEE Spectrum Robotics

By submitting this content request, I have legitimate interest in the content and agree that Technology Innovation Institute, their partners, and the creators of any other content I have selected may contact me regarding news, products, and services that may be of interest to me. By submitting this content request, I have legitimate interest in the content and agree that Technology Innovation Institute, their partners, and the creators of any other content I have selected may contact me regarding news, products, and services that may be of interest to me. I agree to the IEEE Privacy Policy Are you an IEEE member?


A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health

Liang, Biyonka, Xu, Lily, Taneja, Aparna, Tambe, Milind, Janson, Lucas

arXiv.org Artificial Intelligence

In these settings, such as communicable disease management (Tuldrà et al., the underlying transition dynamics are often unknown 1999; Killian et al., 2019), prenatal and infant care (Hegde a priori, requiring online reinforcement & Doshi, 2016; Ope, 2020; Bashingwa et al., 2021), and learning (RL). However, existing methods in online cancer prevention (Wells et al., 2011; Lee et al., 2019), beneficiaries RL for RMABs cannot incorporate properties may at any time enter an adhering (e.g., following often present in real-world public health applications, their treatment regimen) or non-adhering (e.g., missing a such as contextual information and treatment) state. As adherence is often vital for ensuring non-stationarity. We present Bayesian Learning certain health outcomes, programs may allocate resources for Contextual RMABs (BCoR), an online or interventions to patients at risk of drop-out from the program RL approach for RMABs that novelly combines due to continued non-adherence. We can model this techniques in Bayesian modeling with Thompson problem as an RMAB by representing each beneficiary as an sampling to flexibly model a wide range of arm, their adherence status as the state of the corresponding complex RMAB settings, such as contextual and MDP, and the allocation of an intervention as the action.


Open-Vocabulary Argument Role Prediction for Event Extraction

Jiao, Yizhu, Li, Sha, Xie, Yiqing, Zhong, Ming, Ji, Heng, Han, Jiawei

arXiv.org Artificial Intelligence

The argument role in event extraction refers to the relation between an event and an argument participating in it. Despite the great progress in event extraction, existing studies still depend on roles pre-defined by domain experts. These studies expose obvious weakness when extending to emerging event types or new domains without available roles. Therefore, more attention and effort needs to be devoted to automatically customizing argument roles. In this paper, we define this essential but under-explored task: open-vocabulary argument role prediction. The goal of this task is to infer a set of argument roles for a given event type. We propose a novel unsupervised framework, RolePred for this task. Specifically, we formulate the role prediction problem as an in-filling task and construct prompts for a pre-trained language model to generate candidate roles. By extracting and analyzing the candidate arguments, the event-specific roles are further merged and selected. To standardize the research of this task, we collect a new event extraction dataset from WikiPpedia including 142 customized argument roles with rich semantics. On this dataset, RolePred outperforms the existing methods by a large margin. Source code and dataset are available on our GitHub repository: https://github.com/yzjiao/RolePred